872 research outputs found

    A Harmonic Extension Approach for Collaborative Ranking

    Full text link
    We present a new perspective on graph-based methods for collaborative ranking for recommender systems. Unlike user-based or item-based methods that compute a weighted average of ratings given by the nearest neighbors, or low-rank approximation methods using convex optimization and the nuclear norm, we formulate matrix completion as a series of semi-supervised learning problems, and propagate the known ratings to the missing ones on the user-user or item-item graph globally. The semi-supervised learning problems are expressed as Laplace-Beltrami equations on a manifold, or namely, harmonic extension, and can be discretized by a point integral method. We show that our approach does not impose a low-rank Euclidean subspace on the data points, but instead minimizes the dimension of the underlying manifold. Our method, named LDM (low dimensional manifold), turns out to be particularly effective in generating rankings of items, showing decent computational efficiency and robust ranking quality compared to state-of-the-art methods

    A New Web Search Engine with Learning Hierarchy

    Get PDF
    Most of the existing web search engines (such as Google and Bing) are in the form of keyword-based search. Typically, after the user issues a query with the keywords, the search engine will return a flat list of results. When the query issued by the user is related to a topic, only the keyword matching may not accurately retrieve the whole set of webpages in that topic. On the other hand, there exists another type of search system, particularly in e-Commerce web- sites, where the user can search in the categories of different faceted hierarchies (e.g., product types and price ranges). Is it possible to integrate the two types of search systems and build a web search engine with a topic hierarchy? The main diffculty is how to classify the vast number of webpages on the Internet into the topic hierarchy. In this thesis, we will leverage machine learning techniques to automatically classify webpages into the categories in our hierarchy, and then utilize the classification results to build the new search engine SEE. The experimental results demonstrate that SEE can achieve better search results than the traditional keyword-based search engine in most of the queries, particularly when the query is related to a topic. We also conduct a small-scale usability study which further verifies that SEE is a promising search engine. To further improve SEE, we also propose a new active learning framework with several novel strategies for hierarchical classification

    Combining Transfer of TTF-1 and Pax-8 Gene: a Potential Strategy to Promote Radioiodine Therapy of Thyroid Carcinoma

    Get PDF
    Cotransfer of TTF-1 and Pax-8 gene to tumor cells, resulting in the reexpression of iodide metabolism-associated proteins, such as sodium iodide symporter (NIS), thyroglobulin (Tg), thyroperoxidase (TPO), offers the possibility of radioiodine therapy to non-iodide-concentrating tumor because the expression of iodide metabolism-associated proteins in thyroid are mediated by the thyroid transcription factors TTF-1 and Pax-8. The human TTF-1 and Pax-8 gene were transducted into the human thyroid carcinoma (K1 and F133) cells by the recombinant adenovirus, AdTTF-1 and AdPax-8. Reexpression of NIS mRNA and protein, but not TPO and Tg mRNA and protein, was detected in AdTTF-1-infected F133 cells, following with increasing radioiodine uptake (6.1~7.4 times), scarcely iodide organification and rapid iodide efflux (t1/2≈8 min in vitro, t1/2≈4.7 h in vivo).
In contrast, all of the reexpression of NIS, TPO and Tg mRNA and proteins in F133 cells were induced by the synergetic effect of TTF-1 and Pax-8. AdTTF-1 and AdPax-8 coinfected K1 and F133 cells could effectively accumulate radioiodine (6.6-7.5 times) and obviously retarded radioiodine retention (t1/2≈25-30 min in vitro, t1/2≈12 h in vivo) (p<0.05).
Accordingly, the effect of radioiodine therapy of TTF-1 and Pax-8 cotransducted K1 and
F133 cells (21-25% survival rate in vitro) was better than that of TTF-1-transducted cells
(40% survival rate in vitro) (p<0.05). These results indicate that single TTF-1 gene transfer may have limited efficacy of radioiodine therapy because of rapid radioiodine efflux. The cotransduction of TTF-1 and Pax-8 gene, with resulting NIS-mediated radioiodine accumulation and TPO and Tg-mediated radioiodine organification and intracellular retention, may lead to effective radioiodine therapy of thyroid carcinoma

    Nonnegative matrix factorization for clustering

    Get PDF
    This dissertation shows that nonnegative matrix factorization (NMF) can be extended to a general and efficient clustering method. Clustering is one of the fundamental tasks in machine learning. It is useful for unsupervised knowledge discovery in a variety of applications such as text mining and genomic analysis. NMF is a dimension reduction method that approximates a nonnegative matrix by the product of two lower rank nonnegative matrices, and has shown great promise as a clustering method when a data set is represented as a nonnegative data matrix. However, challenges in the widespread use of NMF as a clustering method lie in its correctness and efficiency: First, we need to know why and when NMF could detect the true clusters and guarantee to deliver good clustering quality; second, existing algorithms for computing NMF are expensive and often take longer time than other clustering methods. We show that the original NMF can be improved from both aspects in the context of clustering. Our new NMF-based clustering methods can achieve better clustering quality and run orders of magnitude faster than the original NMF and other clustering methods. Like other clustering methods, NMF places an implicit assumption on the cluster structure. Thus, the success of NMF as a clustering method depends on whether the representation of data in a vector space satisfies that assumption. Our approach to extending the original NMF to a general clustering method is to switch from the vector space representation of data points to a graph representation. The new formulation, called Symmetric NMF, takes a pairwise similarity matrix as an input and can be viewed as a graph clustering method. We evaluate this method on document clustering and image segmentation problems and find that it achieves better clustering accuracy. In addition, for the original NMF, it is difficult but important to choose the right number of clusters. We show that the widely-used consensus NMF in genomic analysis for choosing the number of clusters have critical flaws and can produce misleading results. We propose a variation of the prediction strength measure arising from statistical inference to evaluate the stability of clusters and select the right number of clusters. Our measure shows promising performances in artificial simulation experiments. Large-scale applications bring substantial efficiency challenges to existing algorithms for computing NMF. An important example is topic modeling where users want to uncover the major themes in a large text collection. Our strategy of accelerating NMF-based clustering is to design algorithms that better suit the computer architecture as well as exploit the computing power of parallel platforms such as the graphic processing units (GPUs). A key observation is that applying rank-2 NMF that partitions a data set into two clusters in a recursive manner is much faster than applying the original NMF to obtain a flat clustering. We take advantage of a special property of rank-2 NMF and design an algorithm that runs faster than existing algorithms due to continuous memory access. Combined with a criterion to stop the recursion, our hierarchical clustering algorithm runs significantly faster and achieves even better clustering quality than existing methods. Another bottleneck of NMF algorithms, which is also a common bottleneck in many other machine learning applications, is to multiply a large sparse data matrix with a tall-and-skinny dense matrix. We use the GPUs to accelerate this routine for sparse matrices with an irregular sparsity structure. Overall, our algorithm shows significant improvement over popular topic modeling methods such as latent Dirichlet allocation, and runs more than 100 times faster on data sets with millions of documents.Ph.D

    Crime Topic Modeling

    Full text link
    The classification of crime into discrete categories entails a massive loss of information. Crimes emerge out of a complex mix of behaviors and situations, yet most of these details cannot be captured by singular crime type labels. This information loss impacts our ability to not only understand the causes of crime, but also how to develop optimal crime prevention strategies. We apply machine learning methods to short narrative text descriptions accompanying crime records with the goal of discovering ecologically more meaningful latent crime classes. We term these latent classes "crime topics" in reference to text-based topic modeling methods that produce them. We use topic distributions to measure clustering among formally recognized crime types. Crime topics replicate broad distinctions between violent and property crime, but also reveal nuances linked to target characteristics, situational conditions and the tools and methods of attack. Formal crime types are not discrete in topic space. Rather, crime types are distributed across a range of crime topics. Similarly, individual crime topics are distributed across a range of formal crime types. Key ecological groups include identity theft, shoplifting, burglary and theft, car crimes and vandalism, criminal threats and confidence crimes, and violent crimes. Though not a replacement for formal legal crime classifications, crime topics provide a unique window into the heterogeneous causal processes underlying crime.Comment: 47 pages, 4 tables, 7 figure

    Examining the online reading behavior and performance of fifth-graders: evidence from eye-movement data

    Get PDF
    Online reading is developing at an increasingly rapid rate, but the debate concerning whether learning is more effective when using hypertexts than when using traditional linear texts is still persistent. In addition, several researchers stated that online reading comprehension always starts with a question, but little empirical evidence has been gathered to investigate this claim. This study used eye-tracking technology and retrospective think aloud technique to examine online reading behaviors of fifth-graders (N = 50). The participants were asked to read four texts on the website. The present study employed a three-way mixed design: 2 (reading ability: high vs. low) 2 (reading goals: with vs. without) 2 (text types: hypertext vs. linear text). The dependent variables were eye-movement indices and the frequencies of using online reading strategy. The results show that fifth-graders, irrespective of their reading ability, found it difficult to navigate the nonlinear structure of hypertexts when searching for and integrating information. When they read with goals, they adjusted their reading speed and the focus of their attention. Their offline reading ability also influenced their online reading performance. These results suggest that online reading skills and strategies have to be taught in order to enhance the online reading abilities of elementary-school students

    NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have Good Meme Analysis

    Full text link
    This paper presents a robust solution to the Memotion 3.0 Shared Task. The goal of this task is to classify the emotion and the corresponding intensity expressed by memes, which are usually in the form of images with short captions on social media. Understanding the multi-modal features of the given memes will be the key to solving the task. In this work, we use CLIP to extract aligned image-text features and propose a novel meme sentiment analysis framework, consisting of a Cooperative Teaching Model (CTM) for Task A and a Cascaded Emotion Classifier (CEC) for Tasks B&C. CTM is based on the idea of knowledge distillation, and can better predict the sentiment of a given meme in Task A; CEC can leverage the emotion intensity suggestion from the prediction of Task C to classify the emotion more precisely in Task B. Experiments show that we achieved the 2nd place ranking for both Task A and Task B and the 4th place ranking for Task C, with weighted F1-scores of 0.342, 0.784, and 0.535 respectively. The results show the robustness and effectiveness of our framework. Our code is released at github.Comment: De-Factify 2: Second Workshop on Multimodal Fact Checking and Hate Speech Detection, co-located with AAAI 202

    The 3D Genome Browser: A web-based browser for visualizing 3D genome organization and long-range chromatin interactions

    Get PDF
    Abstract Here, we introduce the 3D Genome Browser, http://3dgenome.org, which allows users to conveniently explore both their own and over 300 publicly available chromatin interaction data of different types. We design a new binary data format for Hi-C data that reduces the file size by at least a magnitude and allows users to visualize chromatin interactions over millions of base pairs within seconds. Our browser provides multiple methods linking distal cis-regulatory elements with their potential target genes. Users can seamlessly integrate thousands of other omics data to gain a comprehensive view of both regulatory landscape and 3D genome structure
    • …
    corecore